Okay, so welcome back. So we're still looking at model-based agents that are based on probability
distributions as world representation and we're still trying to design agents who have a good
model of the world without having to pay a huge computational cost. We've learned about
probability theory a little bit and we've learned about various techniques for probabilistic reasoning
and now we're kind of trying to get a handle on and the first kind of thing we were looking at
was these naive Bayes models. Situations where we had a couple of random variables in this example,
three of them, where we have one cause here cavity and a bunch n many where n can be two as in this
example or 300,000 as in this classification example of variables that are conditionally
independent given the cause. Okay, so that's what a naive Bayes model is and we're interested in
using this situation as an efficient way of representing the full joint probability distribution
and if you think about this situation here we have three random variables and all of them are
Boolean so we have two to the three entries in our full joint probability distribution which is fine
for an example on the slides of a lecture because we want things to be small but if we think about
this other example where we had a classification say of length 10 of newspaper articles you remember
those and you have words counts which are also bigger random variables then we have something
given say 300,000 words in English we have a 10 to the 300,000 full joint probability distribution
and that's not something we want to have to store in our little laptops or even in a server farm
it's going to be very expensive so the name of the game is not going by the theoretical tool the
mathematical tool of the full joint probability distribution but computing those things that we
need when we need them and for this one it's very easy so we looked at these naive Bayes models and
the upshot the most important thing is that somewhere in these computations we have to look
at the chain rule and that is a good thing because it allows us to have all these conjunctions of
random variables and express them and compute them by conditional probabilities which we're much
more likely to have and the wonderful thing is that we never need these long 300,000 arguments
conditional probabilities in the big example but we only need one of them why because if in a
in a Bayesian naive Bayes model we have the cause and lots of evidence variables we only
have P of C of E this one or am I getting it wrong no it's exactly what what I wanted to we only need
one cause here right that makes every individual term much smaller and we are left by a relatively
simple still 300,000 long multiplication that still work but it's a lot much less work and
especially it needs a lot less data so that's the situation here and we've basically worked
through this and discovered a couple of things we've discovered that if we actually have things
that we can observe under these evidence variables and others that are unknown we can actually get
rid of the unknown but by basically summing up over all of the possibilities that's called
marginalization we kind of marginalize the things we don't we don't know and the next thing for
these unknowns that's how it works and the next thing that is always magic to me is this
normalization step we want to end up with a probability distribution we have to because
the left hand side wants to be one so we're ending up with something that doesn't sum up to one but
we know it has to sum up to one and we know that there's this funny divisor which is always the
same and that's basically we interpret that as an as a normalization constant that we can actually
compute from the factor of which we're off from one that needs to be the result so we can kind of
put a lot of work into this into this normalization constant and it basically solves some of our
problems by magic at least that's what it feels like to me and if you really want to understand
what's going on take your fingers and go through all of this and you'll see off this works and if
you're like me then you're immediately forgetting why this works and right so I can retain the
understanding for a very limited time but I've convinced myself that it actually works and I do
it before every lecture and then I'm always scared that I get it wrong so it's a little bit of magic
okay good and we looked at it in the dentistry example and looked at this classification example
which was essentially only an example to show you that we often have loads of variables hundreds
of thousands of them and this still works not a problem and solves real-world problems okay so
we want to look at unless there are any questions are there questions so I would like to look at
Presenters
Zugänglich über
Offener Zugang
Dauer
01:31:37 Min
Aufnahmedatum
2025-05-07
Hochgeladen am
2025-05-09 03:09:05
Sprache
en-US